Performance Comparison of Imputation Methods Using Machine Learning Techniques for Ordinal Missing Data

نویسندگان

چکیده

Objectives: When missing values occur, complete case analysis can cause biased results. In this paper, we discuss imputation methods using machine learning techniques when occurred in ordinal variables. Methods: We consider two techniques, the decision tree and random forest, for of values. use treating variables as ordinal, forest nominal. addition, apply cumulative logistic model. The results are compared with empirical bias, mean squared error accuracy. same applied Korea National Health Nutrition Examination Survey. Results: five categories, yield better performance than case. shows lower bias while higher 3 produces all respects. study, also identified if analysis. Random best performance, parametric method similar to tree. Conclusions: Missing reduce improve performance. If possible, it is recommended impute that reflects meaning order. not treat them at least nominal then impute.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Imputation of Missing Data Using Machine Learning Techniques

A serious problem in mining industrial data bases is that they are often incomplete, and a significant amount of data is missing, or erroneously entered. This paper explores the use of machine-learning based alternatives to standard statistical data completion (data imputation) methods, for dealing with missing data. We have approached the data completion problem using two well-known machine le...

متن کامل

Comparison of alternative imputation methods for ordinal data

In this paper, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to con...

متن کامل

Performance evaluation of different estimation methods for missing rainfall data

There are numerous methods to estimate missing values of which some are used depending on the data type and regional climatic characteristics. In this research, part of the monthly precipitation data in Sarab synoptic station, east Azerbaijan province, Iran was randomly considered missing values. In order to study the effectiveness of various methods to estimate missing data, by seven classic s...

متن کامل

Comparison of missing value imputation methods for crop yield data

Most ecological data sets contain missing values, a fact which can cause problems in the analysis and limit the utility of resulting inference. However, ecological data also tend to be spatially correlated, which can aid in estimating and imputing missing values. We compared four existing methods of estimating missing values: regression, kernel smoothing, universal kriging, and multiple imputat...

متن کامل

Missing data imputation using statistical and machine learning methods in a real breast cancer problem

OBJECTIVES Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set. MATERIALS AND METHODS Imputation methods based...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ?????????

سال: 2022

ISSN: ['1450-9148', '2406-1263']

DOI: https://doi.org/10.21032/jhis.2022.47.3.217